Disambiguating Visual Verbs
نویسندگان
چکیده
In this article, we introduce a new task, visual sense disambiguation for verbs: given an image and a verb, assign the correct sense of the verb, i.e., the one that describes the action depicted in the image. Just as textual word sense disambiguation is useful for a wide range of NLP tasks, visual sense disambiguation can be useful for multimodal tasks such as image retrieval, image description, and text illustration. We introduce a new dataset, which we call VerSe (short for Verb Sense) that augments existing multimodal datasets (COCO and TUHOI) with verb and sense labels. We explore supervised and unsupervised models for the sense disambiguation task using textual, visual, and multimodal embeddings. We also consider a scenario in which we must detect the verb depicted in an image prior to predicting its sense (i.e., there is no verbal information associated with the image). We find that textual embeddings perform well when gold-standard annotations (object labels and image descriptions) are available, while multimodal embeddings perform well on unannotated images. VerSe is publicly available at https://github.com/spandanagella/verse.
منابع مشابه
Distinct processing of function verb categories in the human brain.
A subset of German function verbs can be used either in a full, concrete, 'heavy' ("take a computer") or in a more metaphorical, abstract or 'light' meaning ("take a shower", no actual 'taking' involved). The present magnetoencephalographic (MEG) study explored whether this subset of 'light' verbs is represented in distinct cortical processes. A random sequence of German 'heavy', 'light', and p...
متن کاملThe Interplay between Prosody and Syntax in Sentence Processing: The Case of Subject- and Object-control Verbs
This study addresses the question whether prosodic information can affect the choice for a syntactic analysis in auditory sentence processing. We manipulated the prosody (in the form of a prosodic break; PB) of locally ambiguous Dutch sentences to favor one of two interpretations. The experimental items contained two different types of so-called control verbs (subject and object control) in the...
متن کاملThe Comparative Effect of Visual vs. Auditory Input Enhancement on Learning Non-Congruent Phrasal Verbs by Iranian EFL Learners
Vocabulary is one of the essential components of language and learning phrasal verbs as part of vocabulary is quite challenging for foreign language learners. The present study aimed at investigating the effects of visual and auditory input enhancement on learning non-congruent phrasal verbs. The participants of the study were 90 intermediate English language learners who were divided into two ...
متن کاملAn Unsupervised Verb Class Disambiguation
We present an unsupervised learning method for disambiguating verbs that belong to more than one Levin verb class (1993) when occurring in a particular syntactic frame. We used examples that contain unambiguous verbs in each verb class as the training data for ambiguous verbs in that class. A Naive Bayesian classifier was employed for the disambiguation task using context words as features. Our...
متن کاملProcessing of Arabic diacritical marks: phonological-syntactic disambiguation of homographic verbs and visual crowding effects.
Diacritics convey vowel sounds in Arabic, allowing accurate word pronunciation. Mostly, modern Arabic is printed nondiacritized. Otherwise, diacritics appear either only on homographic words when not disambiguated by surrounding text or on all words as in religious or educational texts. In an eye-tracking experiment, we examined sentence processing in the absence of diacritics and when diacriti...
متن کامل